Goto

Collaborating Authors

 maximum entropy approach


Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning

Neural Information Processing Systems

The Maximum Entropy (Max-Ent) framework has been effectively employed in a variety of Reinforcement Learning (RL) tasks. In this paper, we first propose a novel Max-Ent framework for policy evaluation in a distributional RL setting, named (D-Max-Ent PE). We derive a generalization-error bound that depends on the complexity of the representation employed, showing that this framework can explicitly take into account the features used to represent the state space while evaluating a policy. Then, we exploit these favorable properties to drive the representation learning of the state space in a Structural Risk Minimization fashion. We employ state-aggregation functions as feature functions and we specialize the D-Max-Ent approach into an algorithm, named, which constructs a progressively finer-grained representation of the state space by balancing the trade-off between preserving information (bias) and reducing the effective number of states, i.e., the complexity of the representation space (variance). Finally, we report the results of some illustrative numerical simulations, showing that the proposed algorithm matches the expected theoretical behavior and highlighting the relationship between aggregations and sample regimes.


Deriving the Scaled-Dot-Function via Maximum Likelihood Estimation and Maximum Entropy Approach

Ma, Jiyong

arXiv.org Artificial Intelligence

In this paper, we present a maximum likelihood estimation approach to determine the value vector in transformer models. We model the sequence of value vectors, key vectors, and the query vector as a sequence of Gaussian distributions. The variance in each Gaussian distribution depends on the time step, the corresponding key vector, and the query vector. The mean value in each Gaussian distribution depends on the time step, and the corresponding value vector. This analysis may offer a new explanation of the scaled-dot-product function or softmax function used in transformer architectures [1]. Another explanation, inspired by [4], is based on the maximum entropy approach in natural language processing [5]. In this approach, a query vector and key vectors are used to derive the feature functions for the maximum entropy model.


Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning

Neural Information Processing Systems

The Maximum Entropy (Max-Ent) framework has been effectively employed in a variety of Reinforcement Learning (RL) tasks. In this paper, we first propose a novel Max-Ent framework for policy evaluation in a distributional RL setting, named Distributional Maximum Entropy Policy Evaluation (D-Max-Ent PE). We derive a generalization-error bound that depends on the complexity of the representation employed, showing that this framework can explicitly take into account the features used to represent the state space while evaluating a policy. Then, we exploit these favorable properties to drive the representation learning of the state space in a Structural Risk Minimization fashion. We employ state-aggregation functions as feature functions and we specialize the D-Max-Ent approach into an algorithm, named D-Max-Ent Progressive Factorization, which constructs a progressively finer-grained representation of the state space by balancing the trade-off between preserving information (bias) and reducing the effective number of states, i.e., the complexity of the representation space (variance).


A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Neural Information Processing Systems

We develop a maximum entropy (maxent) approach to generating recom- mendations in the context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probabil- ity that recommendations will cross cluster boundaries and then recom- mending only within clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent frame- work. We conduct experiments on data from ResearchIndex, a popu- lar online repository of over 470,000 computer science documents. We show that our maxent formulation outperforms several competing algo- rithms in offline tests simulating the recommendation of documents to ResearchIndex users.


A Maximum Entropy approach to Massive Graph Spectra

Granziol, Diego, Ru, Robin, Zohren, Stefan, Dong, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine Learning

Machine Learning Research Group and Oxford-Man Institute for Quantitative Finance, Department of Engineering Science, University of Oxford Abstract Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing. The choice of kernel function and bandwidth are typically chosen in an ad-hoc manner and heavily affect the resulting output. We prove that kernel smoothing biases the moments of the spectral density. We propose an information theoretically optimal approach to learn a smooth graph spectral density, which fully respects the moment information. Our method's computational cost is linear in the number of edges, and hence can be applied to large networks, with millions of nodes. We apply our method to the problems to graph similarity and cluster number learning, where we outperform comparable iterative spectral approaches on synthetic and real graphs. Keywords: Networks, Information Theory, Maximum Entropy, Graph Spectral Theory, Random matrix theory, iterative methods, kernel smoothing 1. Introduction: networks, their graph spectra and importance Many systems of interest can be naturally characterised by complex networks; examples include social networks (Mislove et al., 2007b; Flake et al., 2000; Leskovec et al., 2007), biological networks (Palla et al., 2005) and technological networks.


Learning partially ranked data based on graph regularization

Nakamura, Kento, Yano, Keisuke, Komaki, Fumiyasu

arXiv.org Machine Learning

Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechanisms together with a simple estimation procedure. Our estimation procedure leverages a graph regularization in conjunction with the Expectation-Maximization algorithm. Our estimation procedure is theoretically guaranteed to have the convergence properties. We reduce a modeling bias by allowing a non-ignorable missing mechanism. In addition, we avoid the inherent complexity within a non-ignorable missing mechanism by introducing a graph regularization. The experimental results demonstrate that the proposed estimators work well under non-ignorable missing mechanisms.


A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing Systems

We develop a maximum entropy (maxent) approach to generating recommendations inthe context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability thatrecommendations will cross cluster boundaries and then recommending onlywithin clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework.


A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing Systems

We develop a maximum entropy (maxent) approach to generating recommendations in the context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability that recommendations will cross cluster boundaries and then recommending only within clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework. We conduct experiments on data from ResearchIndex, a popular online repository of over 470,000 computer science documents. We show that our maxent formulation outperforms several competing algorithms in offline tests simulating the recommendation of documents to ResearchIndex users.


A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing Systems

We develop a maximum entropy (maxent) approach to generating recommendations in the context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability that recommendations will cross cluster boundaries and then recommending only within clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework. We conduct experiments on data from ResearchIndex, a popular online repository of over 470,000 computer science documents. We show that our maxent formulation outperforms several competing algorithms in offline tests simulating the recommendation of documents to ResearchIndex users.